On the Convergence of Decentralized Gradient Descent

نویسندگان

  • Kun Yuan
  • Qing Ling
  • Wotao Yin
چکیده

Consider the consensus problem of minimizing f(x) = ∑n i=1 fi(x) where each fi is only known to one individual agent i belonging to a connected network of n agents. All the agents shall collaboratively solve this problem and obtain the solution via data exchanges only between neighboring agents. Such algorithms avoid the need of a fusion center, offer better network load balance, and improve data privacy. We study the decentralized gradient descent method in which each agent i updates its variable x(i), which is a local approximate to the unknown variable x, by taking the average of its neighbors’ followed by making a local negative gradient step −α∇fi(x(i)). The iteration is x(i)(k + 1)← ∑ j wijx(j)(k)− α∇fi(x(i)(k)), for each agent i, where the coefficients wij form a symmetric doubly stochastic matrix W = [wij ] ∈ Rn×n. As agent i does not communicate to non-neighbors, wij 6= 0 only if i = j or j is a neighbor of i. We analyze the convergence of this iteration and derive its rate, assuming that each fi is proper closed convex and lower bounded, ∇fi is Lipschitz continuous with constant Lfi , and stepsize α is fixed. Provided that α < O(1/Lh) where Lh = maxi{Lfi}, the objective error at the averaged solution, f( 1 n ∑ i x(i)(k))− f ∗ where f∗ is the optimal objective value, reduces at a speed of O(1/k) until it reaches O(α). If fi are (restricted) strongly convex, then both 1 n ∑ i x(i)(k) and each x(i)(k) converge to the global minimizer x ∗ at a linear rate until reaching an O(α)-neighborhood of x∗. We also develop an iteration for decentralized basis pursuit and establish its linear convergence to an O(α)-neighborhood of the true sparse signal. This analysis reveals how convergence depends on the stepsize, function convexity, and network spectrum.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A decentralized proximal-gradient method with network independent step-sizes and separated convergence rates

This paper considers the problem of decentralized optimization with a composite objective containing smooth and non-smooth terms. To solve the problem, a proximal-gradient scheme is studied. Specifically, the smooth and nonsmooth terms are dealt with by gradient update and proximal update, respectively. The studied algorithm is closely related to a previous decentralized optimization algorithm,...

متن کامل

D$^2$: Decentralized Training over Decentralized Data

While training a machine learning model using multiple workers, each of which collects data from their own data sources, it would be most useful when the data collected from different workers can be unique and different. Ironically, recent analysis of decentralized parallel stochastic gradient descent (D-PSGD) relies on the assumption that the data hosted on different workers are not too differ...

متن کامل

A new Levenberg-Marquardt approach based on Conjugate gradient structure for solving absolute value equations

In this paper, we present a new approach for solving absolute value equation (AVE) whichuse Levenberg-Marquardt method with conjugate subgradient structure. In conjugate subgradientmethods the new direction obtain by combining steepest descent direction and the previous di-rection which may not lead to good numerical results. Therefore, we replace the steepest descentdir...

متن کامل

Asynchronous Decentralized Parallel Stochastic Gradient Descent

Recent work shows that decentralized parallel stochastic gradient decent (D-PSGD) can outperform its centralized counterpart both theoretically and practically. While asynchronous parallelism is a powerful technology to improve the efficiency of parallelism in distributed machine learning platforms and has been widely used in many popular machine learning softwares and solvers based on centrali...

متن کامل

An eigenvalue study on the sufficient descent property of a‎ ‎modified Polak-Ribière-Polyak conjugate gradient method

‎Based on an eigenvalue analysis‎, ‎a new proof for the sufficient‎ ‎descent property of the modified Polak-Ribière-Polyak conjugate‎ ‎gradient method proposed by Yu et al‎. ‎is presented‎.

متن کامل

Two Settings of the Dai-Liao Parameter Based on Modified Secant Equations

Following the setting of the Dai-Liao (DL) parameter in conjugate gradient (CG) methods‎, ‎we introduce two new parameters based on the modified secant equation proposed by Li et al‎. ‎(Comput‎. ‎Optim‎. ‎Appl‎. ‎202:523-539‎, ‎2007) with two approaches‎, ‎which use an extended new conjugacy condition‎. ‎The first is based on a modified descent three-term search direction‎, ‎as the descent Hest...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • SIAM Journal on Optimization

دوره 26  شماره 

صفحات  -

تاریخ انتشار 2016